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CRAMER’S THEOREM IS ATYPICAL 


By Nina Gantert * , Steven Soojin Kim , and Kavita Ramanan 
Technische Universitdt MUnchen, Brown University, and Brown University 

The empirical mean of n independent and identically distributed (i.i.d.) 
random variables (Yj .,X„) can be viewed as a suitably normalized scalar 
projection of the n-dimensional random vector A (Yi,... ,X„) in the di¬ 
rection of the unit vector «^*/^(l,l,...,l) The large deviation prin¬ 

ciple (LDP) for such projections as n —oo is given by the classical Cramer’s 
theorem. We prove an LDP for the sequence of normalized scalar projections 
ofYW in the direction of a generic unit vector e S" *, as n —> oo. This 
LDP holds under fairly general conditions on the distribution of Yi, and for 
“almost every” sequence of directions The associated rate function 

is “universal” in the sense that it does not depend on the particular sequence 
of directions. Moreover, under mild additional conditions on the law of Yi, 
we show that the universal rate function differs from the Cramer rate func¬ 
tion, thus showing that the sequence of directions (1,1,..., 1) e *, 
n G N, corresponding to Cramer’s theorem is atypical. 


1. Introduction. Let = (Ai,... ,A„) be a sequence of n independent and identi¬ 
cally distributed (i.i.d.) R-valued random variables with common distribution 7 G y(M). A 
fundamental probabilistic question is how the empirical mean of A^") behaves as the length 
of the sequence n increases. From a geometric perspective, the empirical mean is a suitably 
normalized version of (the scalar component of) the projection of the n-dimensional vector 
A(”) in the direction of the unit vector defined by 

n times 

(1.1) iW = ^(l,l,...,l)GS''-'. 

In other words, we can write 

( 1 . 2 ) = = 

y/n nf^^ 

where (•,•)« denotes the Euclidean inner product. With some abuse of terminology, for 
Y G M” and v G we hereby write the “projection of x in the direction v” to refer to 

*NG and KR would like to thank ICERM, Providence, for an invitation to the program “Computational 
Challenges in Probability”, where some of this work was initiated. 

^SSK was partially supported by a Department of Defense NDSEG fellowship. 

^KR was partially supported by ARO grant W91 lNF-12-1-0222 and NSF grant DMS 1407504. 

^SSK and KR would also like to thank Microsoft Research New England for their hospitality during the Fall 
of 2014, when some of this work was completed. 

Primary 60F10; secondary 60D05. 

Keywords and phrases: large deviations, projections, high-dimensional product measures, Cramer’s theo¬ 
rem, rate function. 


1 




2 


NINA GANTERT, STEVEN SOOJIN KIM, AND KAVITA RAMANAN 


the scalar component (;c,v)„ G M (rather than the vector {x,v)„v G M”). Then, the expres¬ 
sion (1.2) indicates that questions on the empirical mean for large n can be rephrased in 
a geometric language as questions on suitably normalized projections of high-dimensional 
random vectors. 

The classical Cramer’s theorem characterizes the large deviations behavior of (1.2), the 
empirical mean of i.i.d. random variables, as n —)• oo. In particular, if Xi ~ / has some finite 
exponential moments, in the sense that 

(1.3) 3to > 0 s.t. V|f| < to, A(f) = logIE[/^‘] < oo, 
then we have the limit 

lim -!-logP(lT/”^ >x) = — A*(x), 
n 

where * denotes the Legendre transform, 

(1.4) A*(x) A sup{tx —A(f)}. 

feK 

We refer to [17, §12] for a review of the Legendre transform (also known as the convex 
conjugate). 

Given the geometric view of empirical means given by (1.2), it is natural to investigate 
analogs of Cramer’s theorem for normalized projections in directions 0^"^ G other than 
Such projections correspond to weighted means, 

(1.5) A eW)„ = 1 

Our main result is an LDP for {Wg"'^)nen for almost every (in a sense that is specified below) 
sequence of direcfions 6 = (0^^), ...). In parficular, we show fhaf fhe associafed rafe 

funclion does nol depend on 0, and fhaf if differs from fhe Cramer rafe funcfion A*. Thaf is, 
fhe sequence of direcfions corresponding fo Cramer’s fheorem is “afypical”! 

Remark 1.1. While fhe LDP for (1.5) is novel, fhe corresponding law of large num¬ 
bers (LLN) and cenfral limif fheorem (CLT) for weighfed sums are well known. For ex¬ 
ample, a weak LLN follows from Chebyshev’s inequalify, and a CLT follows from fhe 
Lindeberg condifions (see, e.g., [11, §VIII.4, Theorem 3]). 

The oufline of fhis note is as follows. In Section 2, we sfafe our main resulfs and discuss 
fheir relation fo prior work. In Secfion 3, we prove fhe claimed LDP. In Secfion 4, we 
esfablish fhaf Cramer’s fheorem is afypical, and also commenf on a generalizafion fhaf is 
considered in [12]. 
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2. Main results. We first set some notation. Suppose the random variables X\,X 2 ,... 
are all defined on a common probability space Let || • ||„ denote the Euclidean 

norm on M". Write a„_i for the unique rotation invariant probability measure on the 
unit sphere in M". Let § = rTneN^" ^ 7r„ : S —be the coordinate map such 

that for 6 = ...) G §, we have 7r„(6) = 6^”^ Let a be a probability measure on 

(the Borel sets of) S such that 

(HI) 0 0 71^^ = On-u nGN. 

The generic example to keep in mind that satisfies (H1 ) is the product measure a = 
in which case the projection directions n G N, are independent under a. However, our 
results allow for more general dependencies; for more discussion on a and the condition 
(HI), see Remark 3.3. 

Lor a-a.e. 6 G we prove a large deviation principle for the sequence {Wg'‘^)neN 
with a rate function that does not depend on 6. We refer to [9] for general background on 
large deviations. In particular, recall the following definition: 

Definition 2.1. The sequence of probability measures {lJ.n)neN C T’(]R) is said to 
satisfy a large deviation principle (LDP) with a rate function I : M —[0,oo] if I is lower 
semicontinuous, and for all Borel measurable sets L C M, 

— inf I(x) < liminf^logu„(r°) < limsup ^logu„(r) < —infl(x), 

xero n^oo n n 

where r° and L denote the interior and closure of L, respectively. Lurthermore, I is said to 
be a good rate function if it has compact level sets. 

We say the sequence of M-valued random variables satisfies an LDP if the se¬ 

quence of laws {pn)neN given by = Po satisfies an LDP. 

In particular, for empirical means of i.i.d. random variables, we recall the following clas¬ 
sical result, due to [7, 5]. 

Theorem 2.2 (Cramer). Let be an i.i.d. sequence such that (1.3) holds, and 

let i = ..) be defined as in (1.1). Then the sequence of (1.2) satisfies 

an LDP with the good rate function I[, given by 

(2.1) Ii(>v) = A*(w) = sup{lw — A(l)}. 

(eR 

Let V G T(P) denote the standard one-dimensional Gaussian measure. In the sequel, 
we assume the following condition on A, the logarithmic moment generating function (log 
mgf) of X\ r\j 'y\ 


(H2) 


VigM, [ \A{tu)\‘^v{du) <oo. 

Jr 
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Note that (H2) is stronger than even requiring the exponential moment condition in (1.3) 
to hold with to = For an absolutely continuous 7 with density, a sufficient condition for 
(H2) is that the decay of the tail of the density is strictly faster than exponential, in the 
following sense: 

Lemma 2.3. Suppose that yhas density f, and that there exist p £ (1,°°) and constants 
0 < Ci,C 2 ,C 3 < 00 such that for |x| > C\, we have 

/(v) < 

Then there exists some constant C < 0 ° such that A, the log mgf of J, satisfies the following 
upper bound for all t G M.' 

A{t) <C\t\Pl^P-^'^ +C. 

Moreover, this implies that A satisfies the condition (H2). 

Proof. By Young’s inequality applied to the conjugate exponents p and for £ > 0 
and t,y G M, 

ty< (e^/^|y|) < £^£-F(£-i)|f|MT-i) + £^. 

In the following, let C absorb all constants, and let 0 < £ < C 3 P to find fhaf for t G M, 
A{t)=log[ e'yf{y)dy+ log [ e'^f{y)dy 

< Cl |f I + logC 2 + log [ e^ye-^^'^y'^'dy 

Jr 

< Cl+Ci + £^£-V(p-i)|f|A(p-i) _ iiog(C 3 p - e) + 1 
= C\t\P^^P-^^+c. 

From fhe preceding inequalities, since fhe Gaussian measure v has finife momenfs of every 
order, if is clear fhaf A safisfies fhe infegrabilify condition (H2). □ 


We define fhe following analog of fhe log mgf in fhe case of weighfed sums, 
(2.2) 'P(t) A / tGM. 

JR 

Our firsf main resulf is fhe following. 


Theorem 2.4 (Weighfed LDP). Assume (HI) and (H2). Then, for o-a.e. 6 £ S, the 
sequence (VTg”^)„gM of (1.5) satisfies an LDP with the convex good rate function I(y. given 
by 

I^(w) A»p*(w) =sup{tw-'P(f)}. 
r€M 


(2.3) 
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The proof of Theorem 2.4 is given in Section 3, with intermediate steps established in 
Section 3.1 and Section 3.2, and the proof completed in Section 3.3. 

In principle, the rate function I(j of Theorem 2.4 could depend on the particular choice 
of 6, but our result shows that the rate function is the same for a-a.e. 6. In the case where 
a is the product measure a = this follows immediately from the Kolmogorov 

zero-one law. That is, let 7n be the sigma-algebra generated by and let 



(2.4) 


n—\ 


denote the tail sigma-algebra induced by (O^t) ^ 0 ( 2 ) ^ ^ xhe rate function I(j is measurable 

with respect to “J, and the Kolmogorov zero-one law states that T is trivial under the product 
measure. Hence, I(j coincides for a-a.e. 6 € S. However, our claim holds for general a 
satisfying (HI). In particular. Example 3.2(ii) gives an example of a such that ... 

are highly dependent, T is not trivial, and hence, the lack of dependence of the rate function 
Iff on 6 is not a priori obvious. 

Given the a-a.e. statement of Theorem 2.4, it is natural to ask what happens on the set of 
measure zero in S where the stated LDP does not hold. In particular, our second main result 
Theorem 2.5 shows that under certain additional conditions on A, the sequence of directions 
l associated with Cramer’s theorem is exceptional, in the sense that Cramer’s rate function 
Ii differs from the universal rate function Icr- For the following theorem, we assume 7 is 
symmetric, or specifically: 


VtGM, A{t)=A{-t). 


(H3) 


Theorem 2.5 (Atypicality). Assume A satisfies (H3), and let I; and I(y be given by 
(2.1) and (2.3), respectively. 

a. If Aoyf- is concave on M+, then Ict(w) > Ii(w) for all w G M. 

b. If Ao^/i is convex on M+, then I(j {w) < Ij (>v) for all w G M. 

c. If AoyA is concave or convex, but not linear, on then Ify(>v) = Ii(>v) < 00 if and 
only ifw = 0. 

The proof of Theorem 2.5 is given in Section 4. 

We now provide some sufficient conditions (established in [1]) for the convexity or con¬ 
cavity conditions of Theorem 2.5 to hold. 

Proposition 2.6. Assume the exponential moment condition (1.3) and the symmetry 
condition (H3). 

i. Suppose Y So, the Dirac mass at 0. Define (jp : N — )• M fjy 


(p{k) = {2k+\) 
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If (p is non-decreasing (resp., non-increasing), then Ao is concave (resp., convex) 
on M+. 

ii. Suppose y has density f such that log/o is concave (resp., convex) on My. Then 
Ao yA is concave (resp., convex) on My. 

Proof. Part i. is established in Theorem 7 of [1]. Part ii. follows from applying Theorem 
12 of [ 1 ] with their / replaced by our f o y/i, and noticing that the integrability of /o yA 
follows from the fact that / has finite first moment, due to the exponential moment condition 
of(1.3). □ 


Example 2.7. Suppose 7 is the generalized normal distribution with location 0, scale 
a > 0 , and shape j3 > 1 ; that is, 7 = where 


t^a.pidx) 


2 ttr(i + -4) 




It follows from Lemma 2.3 that satisfies (H2), which implies (1.3). It is also easy to 
see that Pa.p satishes (H3). Thus, the conditions of Proposition 2.6 are satished. It follows 
immediately from Proposition 2.6(ii) that for j3 > 2 (resp., for j 8 < 2), A o yA is concave 
(resp., convex). In fact, for j 8 / 2, the concavity (resp., convexity) is strict. 


The preceding example suggests the particular role of the Gaussian, which corresponds 
to j 8 = 2. In particular, y = Pa ,2 for some 0 ; > 0 if and only if Ao yA is linear. Thus, we could 
interpret the conditions of Theorem 2.5 as evaluating whether our distribution of interest is 
“more” or “less” log-concave than the Gaussian. We also have the following result in the 
Gaussian case (i.e., when 7= Pa.i)^ which holds for all 6 as opposed to just for a-a.e. 6. 

Proposition 2.8. Suppose 7= Pa ,2 far some a > 0. Then, far all 0 G S, the sequence 
(wy )neN satisfies an LDP with the good rate function ^^^(w) = A*(w) = (w/a)^, where 
A* is defined in (1.4) with A the log mgf of the Gaussian with mean 0 and variance (fa jl. 


Proof. This follows from the fact that for all n G N, the Gaussian measure on M” is 
spherically symmetric, and hence, for any G the law of is the same 

as the law of Thus, the LDP for (ITq"^)^^^ follows from the classical Cramer’s 

theorem for empirical means of i.i.d. Gaussians, for which the rate function can be easily 
computed to be A*(>v) = (wjaf^. □ 


Remark 2.9. It is not clear whether a converse of Proposition 2.8 holds. That is, 
whether I(y = I[ if and only if 7 is Gaussian. As one possible approach in this direction, 
it would be sufficient to show that for any measure 7 satisfying both (H2) and (H3) (and 
possibly some additional natural conditions), the function A o yA must be either concave or 


convex. 
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Aside from the sequence of Cramer directions i G §, another natural sequence of direc¬ 
tions to consider is the sequence of canonical basis vectors, e\ = ..) G S, where 

n—l times 

A(l/o,...,d)G§”^^ 

Then = Xij^ for all n. The following result states that under certain tail conditions, 
such normalized projections yield a trivial LDP, again with a rate function different from 
IcT. 


Proposition 2.10. Assume the following condition (which is stronger than (H2)j.' 

(H2') 3C < oo, r G [0,2) such that Vt G M, A(t) < C(1 + |t|''). 

Then the sequence satisfies an LDP with the trivial good rate function Xo given 

by 


Zo(t) = 


0, Y = 0; 

oo, Y / 0. 


Proof. Consider the limit log mgf associated with the Gartner-Ellis theorem (recalled 
for convenience later in Theorem 3.7). For all f G M, 

An(t) = -!-logE[exp(tnW'i”^)] = -logE[exp(t-v^Ai)] = -A(t-v^) < + C). 

n n n n 

Since the exponent r of (H2') satisfies r < 2 by assumption, we have lim„^<x,A„(t) = 0 for 
all t G M. Thus, by the Gartner-Ellis theorem, the sequence (lTi”^)nGN satisfies an EDP wifh 
good rate function 0* = Xo- D 


2.1. Relation to prior work. There is a wealth of literature on large deviations for 
weighted sums, but our work seems to be the first to emphasize the unique position of 
Cramer’s theorem in the geometric setting. Moreover, it appears that none of the existing 
literature is readily adaptable to our particular problem. We offer a partial (but inevitably, 
incomplete) survey of existing results. 

In the somewhat classical works of Book, [2] and [3], we can find asympfofics bounds 
for quantifies of fhe form 


P 


^k=l ^nk^k 

lLk=l ‘^rik 


> C 


where {ank)k<n,nm is a friangular array of weighfs such fhaf = 1 for all n. However, 

fhis does nol address our selling because if we lei Onk = we have L^=i but this 

only yields fail bounds of fhe form P(Wq”^ > cn^^^^Y!k=\ us opposed lo fhe desired 
asympfofics for P(VTg . Furthermore, Book does not establish an EDP or identify a 
rate function. 
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In a more recent line of work, consider [14], where on their p. 932, their A and v corre¬ 
spond to our n and k, respectively. For Z ~ A/^(0,1), we have the correspondence: 


oo 


I 

;=o 


= !{;•<«-G N; 
1 r IE[|Zn ^ au 


, n 


k^j 


J=o‘ 

(j){n) =n, n G N. 


n 


k-\ 


k-i 


n‘ 


G N; 


Suppose that the sequence (at)km (which depends on the particular choice of weights 
aj{n), j,n G N) satisfies the following condition (from p. 932 of [14]): 

(2.5) lim < °°- 

The main result of [14] is that for a sequence of i.i.d. random variables (X<:)^gp| with cumu- 
lants {ck)km< if condition (2.5) holds, then the sequence of weighted means f Ly=i kij{n)Xj, 
n G N, satisfies an LDP wifh rafe funcfion x*, the Legendre fransform of x{t) = YX=i 
However, fhe finifeness condifion (2.5) does nof hold in our seffing of at = E[|Z|^], since 
fhe following limif is infinite: 

lim E[|Z|Y^^ = Vl lim r(^)^/^ = oo. 

k^OO 

Therefore, fhe weighfed mean LDP of [14] does nof apply in our seffing. 

Yef more recenfly, [16] proves an LDP for weighfed empirical means similar fo (1.5), ex- 
cepf wifh weighfs fhaf are uniformly bounded (in n). Our resulfs correspond fo unbounded 
weighfs which are nof covered by fheir resulfs. Similarly, [6] proves an LDP for em¬ 

pirical means of cerfain bounded functionals, which again fails fo apply fo our unbounded 
weighfs. 

In fhe confexf of information fheory, [8] slates an LDP for sums of fhe form f j p (x/, F, ), 

where (x/)/gN are “weighfs”, (F,),gpf is a sequence of random variables safisfying certain 
mixing properties, and p : X x y —)■ ]R_|_ for Polish spaces X and V. The LDP is sfafed 
in fhe form of a generalized asymptotic equipartition property for “distortion measures”. 
However, note fhaf p is assumed fo be nonnegafive, so a function like p(x,y) = xy (cor¬ 
responding fo projeefions) does nof fif wilhin fhe selling of [8]. Moreover, fheir weighfs 
{xi)im are assumed to be a realization of a sfafionary ergodic process, which is nof fhe case 
for our weighfs fhaf are drawn from fhe scaled sphere y/nW'^^. This lends our work 

a geomelric ralher lhan informalion-lheorelic inlerprelalion. 

The paper [13], co-aulhored by fhe firsl and Ihird aulhors of Ibis work, also analyzes 
weighfed sums of i.i.d. random variables, buf fhere fhe emphasis is on sums of subexponen- 
lial random variables, ralher lhan fhe weighfs Ihemselves. 

The mosl closely related work fo our own is fhe recenf work of [4], which gives slrong 
large deviafions (i.e., refined asymplofics) for weighfed sums of i.i.d. random variables and 
i.i.d. weighfs, conditioned on fhe weighfs. Our weighfs y/nd^"^ are nof i.i.d., buf in Section 
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3.1 and Section 3.2, we prove that Theorem 2.4 can be reduced to an LDP for the sequence 
defined in (3.2), which is an i.i.d. weighted sum, conditional on given weights. 
With some additional calculations from this point, the rate function of Theorem 2.4 
could then be deduced from the conditional LDP of [4], stated in their Theorem 1.6 with 
rate function defined in their equation (1.13). Note that condition (hi) of their Theorem 1.6 
has two parts, but our integrability condition (H2) corresponds only to their first part; in fact, 
it follows from Lemma 3.8 that their second part follows from our condition (3.5), which 
is weaker than (H2), and thus, need not be assumed separately. Moreover, our research 
(completed independently) differs due to our emphasis on a geometric point of view; as a 
consequence, we can explicitly identify a rate function and highlight the atypical position 
occupied by Cramer’s theorem. 

Lastly, the methods we use are a simplification of those developed in a companion paper 
[12], where we consider normalized projections of certain non-product measures, as well 
as projections in random directions. 

3. The a-almost everywhere LDP. 

3.1. The surface measure on In this section, we recall a convenient representa¬ 

tion for a random vector distributed according to the surface measure on ', in order to 
obtain (3.3), which reduces a-a.e. statements into more tractable statements about Gaussian 
random variables. Let A = n«6Ni^” denote the space of infinite triangular arrays. That is, 
z G A is of the form z = (z^^\,. ■.) where z^”^ G M” for all n G N. Let fk : A —)■ A be the 
map such that for z G A, the n-th row of Ik(z) is 


Let 7r„ : A —)• M” denote the n-th row map such that 7r„(z) = Let v denote the Gaussian 
measure on M, and let v®” denote the standard Gaussian measure on M". 



Lemma 3.1. If ^ ^ (P(A) is such that 
(3.1) i;o7if^=v^f ugN, 

then 0 = 1^ satisfies (HI). Conversely, if o £ IP(S) satisfies (HI), then there exists 

some G CP (A) satisfying (3.1) such that G = i^o 

Proof. Both results are merely a restatement of the well known fact that if has the 
n-dimensional standard Gaussian distribution, then ||„ is uniformly distributed on 

the unit sphere and independent of ||Z("^ ||„. 

□ 


Note that Lemma 3.1 states that for any given a G CP(S), we can find a corresponding 
G CP(A). Fix such a pair (a, Q. Now, for z G A, define 

1 


H>« = i£x,4 


n) 


■ !=1 


(3.2) 
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Then, given as defined in (1.5), and any good rate function I: M —[0,°°], Lemma 3.1 
implies that 

a ^6 G S : satisfies an LDP with good rate function 

(3.3) = (^ G A : (■p^VLi”^)«GN satisfies an LDP with good rate function . 

In addition. Lemma 3.1 yields a large class of examples of a satisfying (HI), constructed 
via satisfying (3.1). We specify two such examples below. 

Example 3.2. 

(fl) 

a. Consider the completely independent case, where the elements Z- , / = 1,... ,n, n G 
N, are all independent; then the law of 01{Z) is the product measure a = 

where each row of 6 is independent under a. As previously noted, the tail sigma- 
algebra 7 induced by the rows (defined in (2.4)), is trivial in this case due to the 
Kolmogorov zero-one law. 

b. Alternatively, consider the following highly dependent case: let ^ G (P(A) satisfy (3.1) 
such that for i^-a.e. z G A, we have z|"^ = z,-™^ for all i G N and m,n>i (i.e., constant 
within columns). Then, let a = (^ o so that a satisfies (HI) by Lemma 3.1. In 
this case, there is strong dependence across rows which precludes a claim regarding 
triviality of the tail sigma-algebra 7 induced by the rows. In fact, consider the event 

A = {p G S : lim > o| 

Note that A is measurable with respect to 7. However, we also have due to the strong 
law of large numbers (i^-a.e., as stated precisely in (3.4)), 

a(A) = (^ (^z G A : lim Ik 2 > o) = C (z ^ ^ ^ 

That is, 7 is non-trivial, and so I(j cannot a priori be declared as a-a.e. constant 
through a simple analysis of the tail sigma-algebra. 

Remark 3.3. We assume the condition (HI) not in an attempt to be as general as possi¬ 
ble, but rather to point out that the universality of the rate function is a genuinely interesting 
phenomenon. Specifically, if we only consider the independent case of Example 3.2a., then 
the fact that I(y is “universal” (in that it does not depend on 6) is a consequence of the fact 
that the tail sigma-algebra 7 is trivial. However, Example 3.2b. shows that universality of 
the rate function is a more general phenomenon that holds even when T is non-trivial. The 
condition (HI) only imposes constraints on the “marginal” distribution of the n-th row of 
the array 6, and imposes no restrictions on the dependence across different rows n G N. 
In fact, for Z ~ satisfying (3.1), the elements of Z need not even he jointly Gaussian in 
order for the law of 1R(Z) to satisfy (HI). 
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3.2. Exponential equivalence. As a consequence of Lemma 3.1 and the equality in 
(3.3), we can replace a-a.e. statements about W^\ n G N with i^-a.e. statements about 
\\n)Wz"\ n G N. In this section, we go further and explain why in the large devia¬ 
tions setting, we can ignore the contribution of the multiplicative factor ||„. That is, 

we show that such a factor yields an exponentially equivalent sequence, defined as follows. 

Definition 3.4. Let and be two sequences of R-valued random vari¬ 

ables such that for all 5 > 0 , 

limsup-logP(|(§„-|„| >5) = -oo; 

n-^oo fl 

then and are said to be exponentially equivalent. 


Proposition 3.5 ([9]). If (i§n)ngN A a sequence of random variables that satisfies 
an LDP with good rate function I, and {^n)n€'M A another sequence that is exponentially 
equivalent to (i§n)nGN. then (i§n)«eN satisfies an LDP with good rate function I. 


Lemma 3.6. Let (i§«)nGN be a sequence of random variables that satisfies an LDP with 
a good rate function I. Let {a„)„^fq be a deterministic sequence such that a„ ^ I as n^oo, 
and let be another sequence defined by: 


n G N. 

Ifl is quasiconvex — that is, if the set {x G M : I(x) G (— oo,c)} is convex for a// c G M — 
then (i§„)nGN cind (i^„)nGN exponentially equivalent. 

Proof. For £ > 0, let Ag < oo be such that for all n > Ag, we have \ \ —af < £. For 
n>Ne and any 5 > 0, 

\^n-^n\>S 4^ \^n\-\l-ar,\>5 => 

Because I is lower semicontinuous and has compact level sets, it achieves its global mini¬ 
mum at some (not necessarily unique) x G M. Fix 5 > 0 and let £ > 0 be small enough such 
that |x| < Then, 


limsup-!-logP(||„-(§„| > d) <limsup-logP(|(§„| > |) 

fj -^oo fl PI -^OO fl 

< — inf I(x) 
hl>5/e 


= — mm 




The second inequality follows from the LDP for ((§„)„gi^. The last equality follows from 
the fact that if a quasiconvex function has a global minimizer x, then it is non-increasing 
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for X < X, and non-decreasing for ;r > ^ [15, Lemma 1]. Hence, since the rate function I 
is quasiconvex and has a global minimizer jc which satisfies |jc| < S/e, it follows that if 
x> 5/e (resp., x < —5/e), then we have I(x) > I(5/£) (resp., I(x) > I(—5/e)). Lastly, 
take the limit as e —)• 0, and use the compactness of the level sets of I to conclude that 
I(|) -|-oo and I(—I) This proves the required exponential equivalence. □ 


Fix 1/ satisfying (3.1). Due to the strong law of large numbers, we have that for i^-a.e. 

z E A, 


(3.4) 



■I(4”0 


(”)\2 


i=l 


-1/2 

-^ 1 


Thus, we are in a prime position to apply Lemma 3.6, which motivates the analysis of an 

LDPfor (lTW)„eN. 


3.3. Proof of the LDP for {W^^)nen- We aim to prove an LDP for the sequence 
that is, an LDP for sums of independent but not identically distributed random variables 

in) 

(where the lack of identical distribution comes from the inhomogeneous weights z) within 
the sum). The Gartner-Ellis theorem (recalled below) is well suited for such an LDP. 


Theorem 3.7 (Gartner-Ellis). Let (i§„)„gn be a sequence ofM,-valued random vari¬ 
ables. Suppose that the limit log mgf A : M —>• [0,oo) defined by 

A{t) = lim -i-logEfe^"'’"] 

n^oo fi 

is finite and differentiable at all t E M. Then (i§n)n€N satisfies an LDP with the convex good 
rate function A*, the Legendre transform of A. 


Eor a proof of Theorem 3.7, we refer to [10, Theorem V.6], which also includes a more 
general version of the Gartner-Ellis theorem that applies even if A is finite for only some 
f E M (under mild additional conditions). 

The following lemma establishes a property of 'P which will be used in the application 
of the Gartner-Ellis theorem. 


Eemma 3.8. Suppose that 

(3.5) VtEM, [ \A{tu)\v{du) < o°. 

Jr 

Then, the function *P of (2.2) is differentiable on M. 

Proof. Eor each t E M, differentiability of *P at t follows from the differentiability of 
1 1 —)• A{tu) for all m E M, and an application of the dominated convergence theorem with the 
dominating function 

gt{u) = |A'((t — \)u)u\ + |A'((t -|- \)u)u\, M E M. 
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Indeed, fix I G M and for each 5 G (— 1,1) and n G M, define fhe difference quofienf R, § {u) = 
[A((l + 5)u) —A{tu)\/d. Then, 

< sup{|A'((i + a)u)u\ : a G [—1,1]} < g{u), 

where fhe lasf inequalify uses fhe facf fhaf t i-A uA'{tu) is monofone. To show fhaf gt is 
infegrable, firsf nofe fhaf fhe convexify of A implies fhaf for G M, 

A{su) — A(0) < a'(su) su < A{2su) — A{su), 


and hence, 

\A'{su)su\ < |A(0)| + |A(5m)| + |A(25m)|. 

Since, by fhe assumption (3.5), for every 5 G M, the right-hand side is an infegrable function 
of u, it follows that gt is also infegrable for every I G M. 

□ 


Proof of Theorem 2.4. Due to Lemma 3.1 (in particular, its consequence, (3.3)), it 
suffices to prove a (^-a.e. LDP for the sequence where is de¬ 

fined as in (3.2). Due to Lemma 3.6 and the limit (3.4), it suffices to prove a i^-a.e. LDP 
for (VTi”^)MgN- To this end, we consider the Gartner-Ellis limit log mgf for the sequence 
For every n G N and I G M, we have due to the independence of X,-, / = 1, • • • ,n. 


(3.6) 


A«.z(0 = -logE 
n 


exp (tnWt 


(n) 


llogflE 


exp 



1 

n 


LMiA')- 


We first claim that for i^-a.e. z G A, the Gartner-Ellis limit log mgf, the limit of (3.6), 
satisfies, for each I G M, 


(3.7) 


limA„ 2 (f) = / A{tu)v{du) =^^(1), 

’ JR 


with *P as defined in (2.2). 

We proceed by proving the following modified claim (obtained by interchanging the 
quantifiers in our original claim): for each I G M, for i^-a.e. z G A, the expression (3.7) 
holds. Note that if z were an i.i.d. sequence instead of a triangular array, our modified claim 
would follow from the usual strong law of large numbers. However, the strong EEN does 
not necessarily extend to empirical means of rows of i.i.d. random variables in a triangular 
array (see, e.g., [18, Example 5.41]). On the other hand, if the common distribution of the 
i.i.d. elements (in our case, each of the random variables A(lz|”^), / = 1,... ,n, n G N) has 
finite fourth moment, then the strong EEN follows from a standard weak EEN and Borel- 
Cantelli argument [18, p.l 13, (i)]. Due to our assumption (H2), it follows that for all f G M, 
for (^-a.e. z G A, the limit (3.7) holds. 

Next, we aim to interchange the quantifiers to establish the original claim. Note that for 
each n G N, A,; ^ of (3.6) is a convex function (since it is the sum of convex functions). 
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Now, let r C M be countable and dense. Then, it follows from countable additivity that for 
i^-a.e. z G A, the convex functions A„ ;.(0 converge pointwise as n —)• oo to 'T(t), for all t in 
the dense subset T C M. Hence, the convex analytic considerations of [17, Theorem 10.8] 
imply that the pointwise convergence of to 'T(t) holds for all t G M. That is, for 

i^-a.e. z G A, for all t G M, the limit (3.7) holds, proving our original claim. 

Since (H2) holds, *T(t) < o° for all t G K and, because (3.5) follows trivially from (H2), 
Lemma 3.8 implies that'T is differentiable on M. Therefore, by the Gartner-Ellis Theorem 
(Theorem 3.7), for i^-a.e. z G A, the sequence satisfies an LDP with good rate 

function *P*. □ 


4. Atypicality. In this section, we compare the rate function I(j with the Cramer rate 
function ![. We first use Jensen’s inequality to compare the associated log mgfs 'P and A. 


Lemma 4.1. Assume (H3), and let A and *P be defined as in (1.3) and (2.2), respec¬ 
tively. 

a. If Ao yf- is concave on then 'P(f) < A{t)for all t G M. 

b. If Aoyf- is convex on M+, then 'P(f) > A{t)for all t G M. 

c. If Aoyf- is concave or convex, but not linear, on M+, then A{t) = 'P(t) if and only if 
t = 0. 

Proof. We begin with part a. Let v be the standard Gaussian distribution, and let Z ~ v 
be a standard Gaussian random variable. Then, for all f G M, we have 


'P(t)=E[A(tZ)] 


Af(t2z2)V2 


(symmetry) = E 

(Jensen) < A 

= A(0. 


Similar calculations can be used to establish part b. As for part c., recall that in Jensen’s 
inequality, equality holds if and only if either: (i) Aoyf- is linear; or (ii) the underlying 
random variable is almost surely constant. Note that (i) is not the case by assumption. As 
for (ii), this holds if and only if t'^Z^ is almost surely constant, which is the case if and only 
if f = 0. □ 


Before we prove the theorem, we recall some basic facts about the log mgf of Xi ~ y. Let 
the domain of a function / : M —>• M be the set Dy = {;c g M ; f(^x) < oo}. Lor a set D C M, 
let D° denote the interior of D. 

Lemma 4.2. Let A{t) = logE[e*^*] be the log mgf of some random variable Aj. Then, 

1. A is lower semicontinuous; 

2. A is smooth in Df; 
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3. A is convex. 

Furthermore, ifX\ is non-degenerate (i.e., not a.s. constant), then 

4. A is strictly convex in 

5. A* is differentiable in 

6. for X G Dft, the maximum in the definition of the Legendre transform is uniquely 
attained — that is, the following quantity is well defined: 

(4.1) Ijc A argmax{f.)c —A(f)}. 

Proof. These are mostly standard, but we provide sketches of the proofs. For 1., lower 
semicontinuity follows from Fatou’s lemma. For 2., smoothness follows from interchang¬ 
ing differentiation and expectation. Convexity in 3. and strict convexity in 4. follow from 
Holder’s inequality. As for 5., it is classical that if a function is lower semicontinuous and 
strictly convex in the interior of its domain, then its Legendre transform is differentiable in 
the interior of its domain (see [17, Theorem 26.3]). Lastly, for 6., it is also classical that for 
X G Dft, we have C = (A*)'(x) (see [17, Theorem 26.5]). □ 

Proof of Theorem 2.5 . Assume without loss of generality that Xi is non-degenerate. 
If it were degenerate, then due to the symmetry condition (H3), the law of Xi must be that 
7 = 5o, in which case A = 'P = 0. Therefore, I(j and I; are both equal to the characteristic 
function at 0 (which is equal to 0 at w = 0 and -foo for all other w), and the result is trivial. 

Suppose Ao y7 is concave (the convex case is similar, but with inequalities reversed). 
Due to Lemma 4.1, we have 'P(t) < A(t) for all t G M, which due to the definition of 
the Legendre transform implies that I(j (w) = 'P* (w) > A* (w) = (w) for all w G M, thus 

proving a. (and b. for the convex case). 

Further assume the stronger condition of c., that Ao is concave but not linear. Then, 
for w G M such that A*(w) < o°, let fo be as in (4.1), which is well defined due fo fhe 
non-degeneracy condifion of Lemma 4.2. Then, 

Ifj(w) ='P*(w) >t„w-'¥{t„) 

^ foiL A(fo) 

= A*(w) = Ii(>v). 

Due fo Lemma 4.1, fhe second inequalify above is an equably if and only if fo = 0, which 
occurs if and only if (A*)'(w) = 0. Nole lhal A is symmelric, so A* is also symmelric 
(by definilion of fhe Legendre fransform). Moreover, fhe smoolhness of A (see Lemma 
4.2), implies fhe slricl convexity of A* wilhin ils domain (see [17, Theorem 26.3]). Thus, 
(A*)'(>v) = 0 if and only if w = 0. This yields fhe claim of pari c. □ 

Remark 4.3. In Ihis paper, we address fhe “atypical” nafure of fhe directions = 
(1, !,...,!) associated wilh Cramer’s Iheorem for large deviations of producl measures. 
Bui in facl, fhe nofions of alypicalify and universal rale function exfend beyond fhe producl 
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case. In particular, the companion paper [12] establishes LDPs for random projections of 
random vectors distributed according to the uniform measure on U’ balls, again with a rate 
function that coincides for a-a.e. sequence of directions, and the sequence of directions 
= (1,1,..., 1), n G N, can be shown to be atypical in that setting as well. 

Acknowledgements. We would like to thank an anonymous referee for helpful feed¬ 
back on the exposition. 
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